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^ IMPLEMENTING  THE  CRITERION  REFERENCED  USAF  APPRENTICE  KNOWLEDGE 

TEST  PROGRAM 

lLt  Christopher  M.  Anton9 
USAF  Occupational  Mesurement  Center 


An  Air  Force  Apprentice  Knowledge  Test  (AKT)  ie  designed  to  measure  specialty 
knowledge  at  the  three-skill  or  apprentice  level  of  a specific  Air  Force  enlisted  specialty. 
The  Occupational  Test  Development  Branch  of  the  USAF  Occupational  Measurement 
Center  (USAFOMC)  is  responsible  for  developing  and  maintaining  the  AKTs.  AKTs  are 
used  in  conjunction  with  other  factors  to  select  airmen  for  bypass  of  technical  training 
and  direct  entry  into  a specific  career  field  at  the  apprentice  level. 

Prior  to  1976,  AKTs  were  65-item  multiple  choice  tests  with  passing  scores  set 
annually  at  the  thirtieth  percentile  of  the  score  distribution  for  all  examinees  who  had 
previously  taken  a specific  AKT.  Inherent  to  the  method,  passing  scores  fluctuated, 
sometimes  dramatically,  depending  upon  the  examinee  population  for  a given  year.  AKT 
use  in  some  specialties  was  very  low,  thereby  severely  limiting  the  reliability  of  the 
passing  score.  Conversely,  for  high  usage  AKTs,  a change  of  only  one  point  for  the 
calculated  passing  score  on  this  relatively  short  test  meant  a large  difference  in  the  total 
number  of  airmen  passing  or  failing.  For  the  few  65-ltram  AKTs  still  in  existence,  the 
passing  scores  range  from  19  to  26  raw  score  points.  The  most  severe  limitation  of  this 
method  was  the  fact  that  the  passing  scores  were  established  relative  to  the  examinee 
population  without  reference  to  job  incumbents  or  expected  performance. 

^ To  improve  the  AKTs,  USAFOMC  initiated  a series  of  studies.  In  the  first  study, 
AKT  scores  were  compared  for  three  groups:  beginning  trainees  and  graduates  of  a techni- 
cal training  course  for  general  vehicle  maintenance,  and  airmen  already  selected  for 
bypass  in  that  specialty.  (\(ay®ian,JL§36a).  Mean  scores  for  both  the  beginning  trainees 
and  bypass  group  were  significantly  lower  than  the  mean  score  for  graduates.  Differences 
In  scores  of  beginning  trainees  and  graduates  showed  the  test  was  able  to  discriminate 
among  levels  of  knowledge  for  a specialty.  Differences  in  scores  of  the  bypass  group  and 
graduates  demonstrated  specialty  knowledge  differences  between  a group  seeking  appren- 
tice skill  level  and  a group  just  completing  formal  technical  training.  In  comparison,  the 
score  at  the  tenth  percentile  of  graduates  was  the  same  as  the  score  at  the  seventy-fifth 
percentile  of  the  bypass  group.  Using  the  score  just  above  the  tenth  percentile  as  a 
passing  score,  some  airmen  previously  selected  as  bypass  specialists  would  not  be  quali- 
fied. 

'V  A second  study  replicated  the  first  study  on  an  additional  five  Air  Force  specialties 
and  found  similar  results  (Vaughan,  1976HL  Based  on  the  results  of  these  studies,  the 
USAFOMC  Implemented  a criterion  referenced  testing  program  for  AKTs  using  technical 
training  graduates  as  the  criterion  group  and  the  tenth  percentile  of  that  group  as  the 
passing  score.  The  rationale  given  for  originally  setting  the  passing  score  above  the  tenth 
percentile  was  that  extremely  low  scores  are  likely  to  contain  considerable  error  (Lord 
and  Novick,  1970).  Conversely,  a higher  passing  score  was  decided  against  since  it  might 
prevent  acceptable  performers  from  being  selected  to  bypass  training.  Performance  of 
technical  school  graduates  and  selected  bypass  specialists  from  one  of  the  five  specialties 
in  the  previous  study  were  compared  (Vaughan,  1970).  Performance  of  the  bypass  special- 
ists wa9  shown  to  be  equal  to  or  slightly  better  than  the  technical  school  graduates.  This 
evidence  supported  the  decision  not  to  set  the  passing  score  any  higher  than  just  above 
the  tenth  percentile. 

In  1970,  the  USAFOMC  began  converting  ali  AKTs  to  10D  items  and  criterion 
referencing  those  tests  with  a high  usage  (greater  than  25  administrations  per  year),  and 
a large  enough  criterion  group  of  technical  school  graduates.  All  AKTs  were  expanded  to 
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100  items  to  increase  their  reliability.  The  criterion  referencing  anchored  the  perform- 
ance of  bypass  specialist  candidates  on  the  AKT  to  a known  level  of  pr  'formance  of  tech- 
nical school  graduates.  This  allowed  us  to  assume  that  successful  bypass  candidates  had 
at  least  as  much  knowledge  as  the  lower  ten  percent  of  technical  school  graduates  for  a 
given  specialty. 

Two  main  problems  were  encountered.  First,  we  assumed  that  a few  members  of 
the  examinee  group  would  lack  motivation  for  testing  since  they  had  just  graduated,  were 
preparing  to  depart  for  duty  assignments,  and  were  aware  that  the  test  had  no  impact  on 
their  own  training.  The  USAFOMC  explained  the  significance  of  this  testing  to  training 
personnel  and,  in  turn,  the  graduating  trainees.  This  helped  dispel  the  motivation  prob- 
lem. The  second  problem  invc.ved  subject-matter  experts  (senior  noncommissioned  offi- 
cers brought  to  the  USAFOMC  from  working  units  in  each  specialty  to  provide  input  on 
content  of  the  tests).  They  wanted  to  increase  the  difficulty  of  the  tests  to  insure  that 
bypass  specialists  would  be  knowledgeable.  Test  developers  at  the  USAFOMC  explained 
that  increasing  the  difficulty  of  the  test  would  also  decrease  the  average  score  of  the 
criterion  group.  With  a lower  mean  criterion  score,  the  passing  score  would  be  set  lower. 
If  set  low  enough,  some  examinees  might  achieve  a passing  score  by  chance  alone. 

Current  Status 


As  of  September  1982,  of  270  specialties  the  following  number  of  AKTs  are  availa- 
ble. 

TYPE  NUMBER  OF  SPECIALTIES  AVERAGE  USAGE 

Criterion  referenced  87  86 

Noncriterion  referenced  45  25 

No  test  158 

The  AKT  program  now  includes  both  criterion  and  noncriterion  referenced  tests.  All 
AKTs  are  criterion  referenced  unless  annual  usage  is  too  low  to  justify  the  criterion 
referencing.  For  many  specialties,  no  AKT  is  constructed  because  training  is  mandatory 
or  other  reasons  specific  to  the  individual  specialties. 

The  fallowing  table  provides  Information  on  usage  of  AKTs.  Airmen  take  the  exami- 
nations to  bypass  technical  training  when  first  entering  the  service  (bypass)  or  when 
changing  from  one  specialty  to  another  (retraining),  or  to  demonstrate  apprentice  level 
competency  after  a period  of  on-the-job  training  (upgrade), 

AKT  UTILIZATION 

tcVair 


USE 

TOTAL  TESTED 

PERCENT  PASS 

Bypass 

3517 

6m 

Retraining 

1771 

82% 

Upgrade 

1951 

84% 

Total 

7299 

75% 

(Jan-Jun  82) 

USE 

TOTAL  TESTED 

PERCENT  PASS 

Bypass 

934 

59% 

Retraining 

1334 

80% 

Upgrade 

962 

03% 

The  following  table  provides  information  on  the  passing  scores  established  for  both  the 
criterion  and  noncriterion  referenced  AKTs. 
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PASSING  SCORE  DISTRIBUTIONS 


Criterion  referenced 

N 

81 

Mean 

42.96 

SD 

8.37 

Ranqe 

26-60 

%Passinq 

70% 

Noncriterion  referenced 

29 

42.28 

6.20 

30-56 

77% 

(65  item) Noncriterion  referenced 

6 

24.50 

2.74 

19-26 

85% 

The  average  passing  scores  are  nearly  the  same  for  criterion  and  noncriterion 
referenced  tests.  The  difference,  then,  is  not  in  placement  of  the  passing  score,  but  in 
the  criterion  that  determines  that  score  and  the  distribution  of  scores  for  that  criterion 
group.  As  will  be  shown  later,  score  distributions  for  the  criterion  groups  are  much  less 
varied  than  for  the  examinee  groups.  Also,  for  % of  examinees  achieving  passing  scores, 
criterion  and  noncriterion  referenced  tests  are  nearly  the  same.  According  to  the  criteria 
for  setting  the  passing  score  on  noncriterion  referenced  tests,  only  70%  of  examinees 
should  pass.  However,  as  stated  earlier,  the  passing  scores  can  fluctuate  from  year  to 
year  according  to  the  population  of  examinees  and  the  number  of  examinees  passing 
depends  upon  the  distribution  of  scores  for  one  group  compared  to  all  past  examinees. 
For  the  65-item  noncriterion  referenced  tests,  there  is  more  opportunity  for  fluctuation 
in  scores  from  year  to  year  and  for  examinees  to  achieve  passing  scores  by  chance. 

Some  Specific  Criterion  Referenced  AKTs 

Six  criterion  referenced  AKTs  were  selected  for  analysis  of  both  the  criterion 
group  and  examinee  group  scores.  Analyzing  the  AKTs  individually,  two  were  from  spe- 
cialties previously  studied  by  Vaughan  (1976a,  1978),  two  were  selected  for  having 
extremely  low  passing  rates  and  two  were  selected  for  having  extremely  high  passing 
rates.  Air  Force  Specialty  Code  (AFSC)  47230>  Apprentice  Base  Vehicle  Equipment 
Mechanic  is  similar  to  the  general  mechanic  specialty  examined  by  Vaughan  (1976a).  The 
passing  score  of  45  on  this  test  is  close  to  the  average  of  43  for  all  criterion  referenced 
AKTs.  In  the  1976  study,  the  tenth  percentile  of  the  criterion  group  was  the  75th  percen- 
tile of  the  bypass  group.  In  this  case,  the  tenth  percentile  of  the  criterion  group  is  the 
47th  percentile  of  the  bypass  group  and  the  test  is  much  more  selective  for  the  bypass 
group  than  the  retraining  group.  For  the  purposes  of  the  AKT,  these  characteristics  are 
desireable. 
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AFSC  90230,  Apprentice  Medical  Service  Specialist,  was  used  in  the  performance 
measurement  study  (Vaughan,  1978)  and  criterion  referfincini.  study  (Vaughan,  1976b). 
The  passing  score  of  46  is  also  near  the  average  for  all  criterion  referenced  AKTs.  33% 
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of  the  bypass  group  were  below  passing  on  this  test  compared  to  58%  in  the  1976  study. 
Though  means  of  examinee  and-  criterion  groups  were  nearly  the  same,  the  examinee 
group  had  the  greater  variance.  The  variance  was  not  due  to  subgroups,  since  bypass  and 
retrain  groups  both  had  large  variance  with  their  standard  deviations  twice  the  difference 
of  their  means.  The  upgrade  group  had  less  varib. .. .3  but  was  a smaller  group  and  had  a 
mean  similar  to  the  bypass  group.  What  was  notable  was  the  large  percent  passing  in  the 
bypass  group,  indicating  that,  for  this  career  field,  civilian  experience  may  provide  ade- 
quate background.  Considering  the  variance  of  the  examinee  groups,  the  cutoff  scores 
were  able  to  discriminate  among  examinees  despite  the  similarity  of  examinee  and  crite- 
rion mean  scores.  SM30 
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55330 

APPRENTICE  ENGINEERING  ASSISTANT  SPECIALIST 


n Mean  S.D.  10th  Percentilt 

Criterion  Group  224  70.40  10. 7&  55 


RAW  SCORE 

Two  specialties,  Plumbing  and  Engineering  Assistant,  showed  high  failure  rates. 
Passing  scores  of  60  and  56  respectively  were  relatively  high.  Again,  the  examinee  scores 
were  highly  varied.  For  the  plumbing  specialty,  although  the  failure  rate  for  the  bypass 
group  is  the  highest,  the  failure  rate  for  the  retraining  group  is  also  high.  This  suggests 
that  those  retraining  may  be  coming  from  a variety  of  career  fields  and  do  not  have  the 
background  knowledge  required.  For  the  engineering  assistant  specialty  there  is  a much 
higher  failure  rate  in  the  bypass  group  than  in  the  retrain  group.  This  suggests  that 
knowledge  required  for  thi3  specialty  may  not  be  acquired  in  a civilian  related  job  or 
there  may  not  be  a related  civilian  job.  Again,  the  retrain  group  has  a high  failure  rate 
that  may  suggest  that  those  retraining  are  coming  from  a variety  of  career  fields  and  lack 
the  needed  background  knowledge.  Also,  the  criterion  group  scores  are  higher  than  typi- 
cal, Content  of  the  test  and  training  quality  and  emphasis  may  contribute  to  this  effect. 
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24130 

APPBUTXGK  SJlim  SPBCXALIST 
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AFSCa  20630,  Apprentice  Imagery  Interpreter  and  24130,  Apprentice  Safety  Spe- 
cialist, had  AKTs  with  few  or  no  failures.  Both  were  different  from  the  other  AKTs  stud- 
ied in  that  only  one  test  was  administered  for  bypass  and  most  tests  were  administered  to 
Air  Force  Reserve  and  National  Guard  members  either  for  retraining  or  upgrading 
purposes.  Higher  examinee  means  would  be  expected  for  these  groups  than  far  bypass 
groups.  Airmen  taking  these  exams  may  have  already  worked  in  the  specialty  or  a very 
closely  related  specialty.  In  the  case  of  the  Safety  specialty,  some  knowledge  of  that 
field  is  required  for  all  specialties. 

In  general,  all  six  AKTs  exhibit  d two  distinct  characteristics.  First,  the  variance 
In  the  distribution  of  scores  was  always  greater  for  the  examinee  group  than  the  criterion 
group.  Though  it  can  be  expected  that  the  criterion  group,  having  just  completed  training 
in  a specialty,  would  not  vary  much  on  a test  covering  that  specialty,  It  was  somewhat 
less  expected  that  the  examinee  group  scores  vary  so  much  more  than  the  i iterion  group. 
For  the  Apprentice  Medical  Service  Specialist  test,  the  standard  deviation  for  the 
examinee  group  was  nearly  five  points  greater  than  for  the  criterion  group.  Second,  in 
looking  at  the  subgroups  of  examinees,  those  taking  the  test  for  retraining  and  those  for 
upgrading  always  had  higher  mean  scores  than  those  in  basic  training  trying  to  bypass 
technical  training.  This  result  can  be  expected  since  those  airmen  retraining  and  testing 
for  upgrading  have  been  in  the  Air  Force  for  s period  if  time  already  and  have  had  an 
opportunity  for  more  specific  experience  or,  in  the  case  of  testing  for  upgrading,  have 
been  through  on-the-job  training  in  the  specialty.  These  characteristics  indicate  that  the 
criterion  referenced  tests  are  able  to  discriminate  across  varied  groups  of  examinees. 

Conclusions 


For  the  AKTs  analyzed,  the  higher  means  and  relatively  small  standard  de . ,ations 
of  the  criterion  groups  provide  a more  precise  pass/fail  cutoff.  It  can  be  seen  from  the 
score  distributions  that  when  the  scores  at  and  below  the  tenth  percentile  for  the  crite- 
rion group  are  eliminated,  the  criterion  group  has  greater  homogeneity  so  that  selection 
for  bypass  or  retraining  is  similar  to  membership  versus  nonmembership  in  the  criterion 
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group  rather  than  achieving  a specified  criterion  percentile  across  a distribution  of  crite- 
rion scores.  This  serves  the  expressed  purpose  of  the  AKTs  to  provide  a means  of 
selecting  or  not  selecting  an  individual  to  bypass  technical  training. 

For  those  AKTs  with  either  very  low  or  very  high  passing  rates,  criterion 
referenced  tests  were  able  to  discriminate  where  the  noncriterion  referenced  tests  would 
have  allowed  too  many  or  too  few  passing  scores  respectively. 

Recommendations 


Given  large  differences  in  mean  scores  of  examinee  and  criterion  groups,  it  is  diffi- 
cult to  determine  the  validity  of  very  high  or  very  low  passihg  rates.  Performance  studies 
of  the  bypass  groups  (Vaughan,  1978)  should  provide  validity  for  the  criterion  cutoff 
scores.  We  are  directing  future  research  toward  this  goal. 

Additionally,  the  high  variance  of  examinee  groups  analyzed  indicates  a need  for 
screening  potential  examinees.  Given  the  wide  range  of  examinee  scores,  some  tests  may 
be  administered  to  airmen  lacking  the  appropriate  background  knowledge  or  experience 
needed  for  a specialty.  This  suggests  overuse  of  the  tests  and  need  for  a better  screening 
procedure. 
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